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A  UNIFYING  APPROACH  TO  LINEAR  AND  NONLINEAR 
ARRAY  PROCESSING:  A  TUTORIAL 


1.  INTRODUCTION 

Array  processing  is  the  extraction  of  useful  information  about  a  spatiotemporal  field  from  mea¬ 
surements  taken  using  an  array,  or  spatial  configuration,  of  sensors.  Practical  restrictions  on  the 
number  of  sensors  and  their  placement,  as  well  as  the  typical  low  signal-to-noise  ratio  (S/N),  have 
made  the  use  of  sophisticated  algorithms  to  process  the  limited  data  an  option  of  growing  importance, 
particularly  as  computer  power  increases  and  costs  decline.  The  purpose  of  this  report  is  to  present,  in 
a  unifying  fashion,  a  number  of  recently  developed  array  processing  methods. 

Methods  for  array  processing  fall  into  one  of  two  broad  categories,  which,  borrowing  terminology 
from  computing,  are  described  as  on-line  and  batch.  On-line  methods  process  in  real  time,  as  the  data 
are  received  and  any  adaptation  is  incorporated  through  feedback  loops.  Batch  methods  accept  as  data 
the  full  set  of  spatiotemporal  measurements,  usually  in  the  form  of  cross-sensor  correlation  matrices. 
Our  concern  here  is  exclusively  with  batch  methods. 

The  information  extracted  from  the  data  can  vary;  the  typical  view  is  that  the  data  contain  certain 
components,  signals,  which  can  be  described  well  using  a  small  number  of  parameters,  and  it  is  the 
values  of  these  parameters  that  we  seek.  What  the  parameters  are  will  depend  on  the  physical  model, 
although  much  of  the  development  presented  here  is  independent  of  the  actual  physical  model.  We  use 
a  planewave  model  in  most  cases,  but  this  is  not  essential,  either  to  the  implementation  or  to  the 
understanding  of  the  methods  discussed.  We  focus  on  bearing  estimation  and  speak  of  steering  or 
looking  in  a  particular  direction,  but  most  of  what  we  say  applies  to  the  estimation  of  other  parameters 
as  well. 

Because  many  of  the  methods  used  for  bearing  estimation  are  related  to  mathematical  procedures 
that  serve  other  purposes  (power  spectrum  estimation,  statistical  hypothesis  testing)  they  are  often  dis¬ 
cussed  in  the  literature  in  a  manner  that  obscures  their  relation  to  one  another  and  their  properties  with 
regard  to  array  processing.  For  example.  Burg’s  maximum  entropy  method  satisfies  a  number  of  infor¬ 
mation  theoretic  criteria,  but  what  does  it  do  to  array  data  and  why?  We  attempt,  therefore,  to  rederive 
many  of  the  well  known  linear  and  nonlinear  methods,  using  the  unifying  framework  of  the  linear 
filter.  This  is  a  natural  framework,  and  one  that  has  already  been  used  in  many  cases.  Put  simply,  we 
wish  to  linearly  transform  the  data  to  enhance  the  information-bearing  components  relative  to  the  oth¬ 
ers.  The  filters  differ  from  one  another  with  respect  to  the  criteria  of  optimality  that  are  imposed. 

Several  appendixes  are  included  in  this  report  to  help  make  this  tutorial  self-contained. 

We  assume  throughout  that  we  have  an  arbitrarily  configured  array  of  M  sensors,  in  vector  loca¬ 
tions  s„,,  m  -  1,  . . .  ,  Af,  that  the  received  time  series  at  each  sensor  is  Fourier  filtered  to  produce,  at 
each  successive  block  of  time,  a  data  vector  x  —  (jc(I),  ...  ,  x{M))^  of  single-frequency-bin  complex 
values.  By  limiting  the  methods  to  a  single  frequency,  we  can  represent  the  time  delays  associated 
with  planewave  arrival  directions  in  terms  of  phase  changes  between  sensors.  If  the  sensor  locations 
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take  the  form  s„  =  mA  v,  where  A  >  0  and  v  is  a  fixed  unit  vector,  the  array  will  be  called  a  uniform 
line  array.  If,  in  addition,  the  interelement  spacing  is  A  =  one-half  the  wavelength,  then  the  array  is 
called  Nyquist.  We  assume  throughout  that  A  is  at  most  1/2  wavelength,  to  avoid  the  problem  of  alias¬ 
ing. 

Most  of  the  methods  discussed  here  use  the  cross-sensor  correlation  matrix,  R  ~  <x  x^>, 
where  denotes  conjugate  transposition  and  <  >  indicates  averaging  over  the  x  associated  with  each 
time  block.  The  linear  filters  we  discuss  operate  on  each  x,  but  the  optimality  is  measured  in  terms  of 
average  performance,  hence  the  appearance  of  R. 

A  localized  farfield  source  at  frequency  oj,  with  wavevector  k  and  unit  amplitude,  produces  at  the 
array  the  "steering  vector" 

elk)  =  (exp  (—ioiu  ■  Si/c),  . . .  ,  exp  (—iatu  Sm/c))^,  (1.1) 

where  u  is  the  unit  vector  normal  to  the  planewave,  c  the  speed  of  propagation,  oj  the  (angular)  fre¬ 
quency  of  interest,  and  k  ==  (<d/c)u.  The  usual  model  for  x  is 

X  —  I.a(k)e(k)  +  n  ,  (1.2) 

where  each  a(k)  is  random,  see  Appendix  B,  «  is  a  noise  vector,  and  the  sum  is  taken  over  all  the 
physically  meaningful  k.  To  allow  us  to  use  matrix  notation,  we  restrict  k  to  some  large  but  finite  fam¬ 
ily;  this  is  not  a  significant  alteration  and  does  not  affect  the  subsequent  development.  If  the  signals  of 
interest  do  not  correspond  to  planewaves,  but  to  some  other  waveforms  given  by  the  physical  model, 
these  can  be  substituted  in  our  formulas  wherever  elk)  appears,  and  the  k  can  be  viewed  as  a  vector 
parameter. 

In  most  cases  the  a(k)are  zero  except  for  a  very  small  number  of  the  k  and  the  goal  is  to  deter¬ 
mine  how  many  are  nonzero  and  which  k  these  are.  A  straightforward  method  for  solving  this  problem 
is  based  on  the  simple  properties  of  the  dot  product  of  vectors.  To  illustrate,  suppose  x  =  elko)  for 
some  fixed  ko.  We  then  form  the  function  of  k  given  by 

=  < \elk)^x/MV>  =  eik)*R  elk)IM\  (1.3) 

This  function  attains  its  maximum  value  when  k  =  /co>  ffOTi  which  the  value  of  ko  can  be  ascertained. 
This  estimation  P\  is  called  the  conventional  estimator.  If  x  consists  of  several  of  the  elk),  plus  noise, 
as  in  Eq.  (1.2),  P,  may  not  work  as  well.  If  components  of  x  are  not  sufficiently  dissimilar  lelk^  and 
elkj),  with  k|  close  to  kj),  then  Px  may  not  resolve  these  two  signals;  this  happens  particularly  when 
M  is  small.  Also  the  noise  vectors  can,  in  certain  cases  (such  as  when  w  is  small  compared  to  the 
Nyquist  frequency  ctr/A),  appear  to  come  from  localized  sources. 

The  quantity  inside  the  absolute  value  in  Eq.  (1.3), 

yifk)  “  efk)"*^  x/A/  (1-4) 

has  the  form  of  a  linear  filtering  of  x;  that  is,  we  can  write  Eq.  (1.4)  asyilk)  -  ylk), 

ylk)  =  b*x blk)*x  (1-5) 

if  we  use  b  ~  bik)  =  elk)/M.  The  filter  (or  vector  of  weights)  ft  -(/>(!),  ...  ,  blM))^  depends  on 
k,  the  value  being  tested  at  the  moment.  We  filter  x  through  each  of  the  ft(k),  hoping  that,  because 
of  the  way  ft(k)  has  been  designed,  if  x  has  any  signal  component  corresponding  to  k  this  component 
will  be  passed,  while  others  are  suppressed.  Because  the  vectors  have  finite  length  M,  it  is  impossible 
to  suppress  completely  all  other  components.  In  subsequent  sections  we  consider  ways  of  improving 
the  filter.  In  the  case  of  Px  the  filter  is  (essentially)  elk)  itself;  in  other  methods  ft(k)  uses  e(k),  but 
is  not  simply  (a  multiple  of)  e(k),  as  it  is  here. 
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Returning  to  Eq.  (1.2),  we  can  write  it  in  matrix  notation  as 

X  ^  Ea  +  n  ,  (1.6) 

where  there  are  as  many  columns  in  E  (and  entries  in  u)  as  there  are  k  in  the  large  finite  set  we  are 
using.  Then  Eq.  (1.6)  is  an  underdetermined  system  of  noisy  linear  equations  and  there  is  a  sizable 
literature  on  the  subject  of  "solving"  such  systems  (1).  In  the  array  processing  case,  there  are  two 
aspects  that  make  the  problem  distinctive: 

•  we  usually  have  a  collection  of  various  x,  the  aik)  are  viewed  as  random  and  we  desire 
only  the  average  power  <  |a(/c)P>,  or  the  correlation.  <a(k)  a{l)>  and 

•  we  have  the  prior  information  that  all  but  a  few  of  the  a(k)  are  zero,  and  we  seek  ways  to 
incorporate  this  information. 

The  following  sections  present  a  number  of  methods,  based  on  linear  filtering,  to  help  us  decide 
how  many  signal  components  are  present  and  what  the  signal  powers  and  cross  correlations  are.  Those 
filters  constrained  by  b^eik)  =  1  will  provide  estimates  of  signal  power,  while  others,  not  so  con¬ 
strained,  provide  only  indicators  of  signal  presence  and  do  not  simultaneously  estimate  power.  At  the 
end  of  the  fourth  section  of  this  report  we  discuss  these  ideas  again. 

Because  this  is  a  tutorial,  and  not  a  survey  paper,  no  attempt  is  made  to  provide  extensive  refer¬ 
ences  to  the  literature  as  we  proceed.  There  are  a  number  of  standard  survey  articles  that  should  be 
consulted,  which  provide  good  bibliographies  [2-4l. 
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2.  OPTIMAL  NOISE  SUPPRESSION 


The  conventional  estimator,  P\  in  Eq.  (1.3),  was  derived  earlier  from  a  simple  matching  of  x  with 
each  e(.k)\  if  x  consists  of  one  signal  component  eik^),  this  estimator  will  tell  us  what  ko  is.  For  cer¬ 
tain  noises,  however.  Pi  can  produce  misleading  results.  The  purpose  of  this  section  is  to  consider 
filters  designed  to  optimally  suppress  the  effect  of  the  noise  vector  n. 

We  now  assume  that  the  a(k)  each  have  mean  zero,  that  n  has  mean  zero,  and  that  the  a(k)  are 
independent  of  n  (although  not  necessarily  of  each  other).  We  also  assume  that  the  averaging  time 
(number  of  time  blocks)  is  sufficient  to  permit  R  to  be  written  approximately  as 

R  =  Ro=  (t^EAqE'*'  +  p^yVo,  trace(/lo)  =  trace(Ao)  =  M,  (2.1) 

with  (t^Aq  -  [<a(k)  a(/)>I,  p^No  -  [</i(m)  n(/)>).  Our  goal  is  to  estimate  Aq,  which  we  do  not 
assume  to  be  diagonal,  so  as  to  include  the  possibility  of  correlated  arrivals.  In  what  follows  we  use  A 
and  N  to  denote  prior  estimates  of  o-^Aq  and  p^Nq. 

If  all  the  a{k)  are  zero  (no  signals)  then  the  conventional  estimator  gives 

PiOf)  -  eik^ReikyM^  ph(kVNoe(.k)/M\  (2.2) 

so  that  if  No’‘  I  (/  the  identity  matrix)  then  P\{k)  is  constant  for  all  k.  If  the  noise  is  correlated 
between  sensors  /),  however,  Pi(k)  can  exhibit  local  maxima  that  can  be  mistaken  for  low 

level  signals. 

Holding  k  fixed  for  the  moment,  consider  two  hypotheses:  (ffo)  R  =  Ng  (no  signals);  (^i) 
R  e(k)  eik)*  -t-  Ng.  Let  b  be  any  linear  filter  (complex  A/-vector)  and  consider  its  performance  in 
each  case.  We  have 

(Hg):  <  |6-"xP>  -  b*Rb  -  b*Ngb\  and  (2.3) 

(//,):  <  |6+xP>  -  \b*e(k)\^  +  b*Ngb.  (2.4) 
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We  choose  b  to  pass  e{k)  undistorted  {b*e(k)  =  1),  while  minimizing  b^Nb.  It  follows  from  vector 
calculus  that 

b  =  XA^“*e(/c),  with  X  =  \le{kyN~'e{k).  (2.5) 

Applying  this  filter  to  our  data  vectors,  x  produces  output 

>'2(^)  =  ke(k)*  N~^x\  (2.6) 

the  average  magnitude  squared  output  is  our  optimal  noise  suppression  estimator, 

P^(k)  -  <  |yj(X)P>  -  \'^e(kyN-'RN-'^e(k).  (2.7) 

Note  that  if  the  noise  is  assumed  to  be  uncorrelated  (A  =  /),  then  P\{k)  =  Pjffe)  for  all  k. 

Correlated  noise  matrices  are  a  common  occurrence  when  a  uniform  line  array  is  used  in  which 
the  spacing  A  is  smaller  than  the  Nyquist  rate  (1/2  wavelength  or  irc/w).  Cox  [5]  discusses  various 
intersensor  correlations  that  can  be  observed  in  commonly  encountered  ambient  noise  environments. 
In  practice  we  usually  do  not  know  Nq  exactly,  so  we  either  model  it  using  prior  information  or  esti¬ 
mate  it  from  neighboring  frequency  bins  or  signal-absent  measurements,  if  available.  One  particular 
matrix  that  is  used  to  model  ambient  (spherical)  isotropic  noise  on  a  uniform  line  array  is  the  sine 
matrix,  with  entries 

N„  j  =  <n(m)  n(j)>  =  sin  ((rw  —  y)Aa)/c]/(m  —  j)An.  (2.8) 

If  the  array  is  Nyquist,  then  this  matrix  is  /;  otherwise  sensors  are  mutually  correlated. 

The  estimator  P^ik)  requires  the  inversion  of  N.  If  N  is  the  sine  matrix  and  A  is  considerably 
smaller  than  ttc/w,  then  approximately  A/(A<o/jrc)  of  its  eigenvalues  are  close  to  1,  the  rest  is  close  to 
zero  and  N  is  ill-conditioned  [6].  If  we  use  the  sine  matrix  in  place  of  the  true  Nq  in  computing  Piik), 
and  if  the  true  Nq  also  contains  a  small  amount  of  uncorrelated  noise,  then  the  estimator  becomes 
unstable.  It  is  always  safer  to  add  a  little  to  the  main  diagonal  of  the  sine  matrix  for  stability. 

In  deriving  P2  we  held  k  fixed  and  considered  two  possibilities:  no  signal  (//q);  and  one  signal  in 
the  k  direction  {H\).  Clearly  there  is  another  alternative;  there  can  be  signals  in  other  directions.  The 
filter  Eq.  (2.5)  is  not  designed  to  take  this  into  account.  Consequently  the  presence  of  signals  in 
nearby  directions  can  cause  a  large  output  in  the  k  direction,  even  when  there  is  no  signal  in  that  direc¬ 
tion.  This  can  lead  to  loss  of  resolution  and  poor  sidelobe  structure. 

The  term  "sidelobe  structure"  refers  to  the  various  values  of  an  estimator  P(k)  if  k  is  held  fixed, 
but  R  =  e(.l)  edV  is  varied  as  a  function  of  /.  That  is,  the  sidelobe  structure  is  the  estimator  output 
at  k  caused  by  a  single  signal  e(/),  viewed  as  a  function  of  /.  Clearly  it  is  good  to  have  low  sidelobes, 
so  that  signals  in  other  directions  do  not  produce  high  output  in  direction  k.  One  problem  with  the  P\ 
estimator  that  persists,  even  when  the  noise  is  uncorrelated,  is  the  dependence  of  the  sidelobe  structure 
on  the  array  configuration.  We  consider  this  problem  next. 

3.  OPTIMAL  SIDELOBE  PLUS  NOISE  SUPPRESSION 

As  before,  we  hold  k  fixed  temporarily  and  design  the  filter  b  —  b{k).  We  define  the  sidelobe 
function  of  filter  b  to  be 

siD-b^eil),  (3.1) 

the  output  of  the  filter  when  the  input  is  eil).  Subject  to  the  constraint  b^  e{k)  —  1,  we  minimize 

'^UiDV  +  b*Nb.  (3.2) 
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Note  that  because  sC/c)  —  1  by  constraint,  it  is  not  necessary  to  say  */  ^  k’  in  the  summation.  We  can 
write 

£  U(/)P  -  £  siDsUy  -  j£ e(/)c(/)-^j  b-b^EE^b 

so  Eq.  (3.2)  becomes  b*(EE*  +  N)b.  Our  solution  is  then 

b  -  kiEE^  +  N)-'  e(k),  X  -  l/c(lt)+(££^  +  N)-^e(k).  (3.3) 

Applying  this  filter  to  the  data  gives 

y3(X)-Xe(X)+(££+  +  N)-’x,  (3.4) 

and  estimator 

Piik)  -  <|>3(X)P>  -  k^eikriEE^  +  N)-‘ /?(££+  +  N)-^e(k).  (3.5) 

Implicit  in  our  concern  about  sidelobe  structure  is  our  desire  to  control  the  average  behavior  of  our 
estimator,  as  signals  appear  at  the  various  /.  Compare  Pj  and  £]  in  this  average  sense,  with 
N-  0  -  0/. 


For  a  given  /  and  R  —  e(l)e(l)*,  the  estimator  Pi(k)  yields 

Piik'J)  —  eik)^e(l)eU)*e(k)IM^, 
so  that  averaging  over  /  we  get  (except  for  a  constant) 


average  {P^{k■,l))  ==  eik)* 


£e(/)e(/)+  e(k)  -  eikVEE^eik). 


Using  £3  instead  we  obtain 

P3(k;/)  -  k^e(k)*(EE^)-‘e(/)e(/)-^(EE*^)-'e(k), 
so  that  averaging  over  /  gives  (except  for  a  constant) 

average  PjCk;/)  -  \^e(k)UEE^)-'  EE^(EE*)~'e(k) 

-  l/e(ifc)+(££^)-‘e(it). 


(3.6) 

(3.7) 


(3.8) 

(3.9) 


As  we  see  in  Section  5,  Eq.  (3.9)  is  the  maximum  likelihood  or  Capon’s  estimator  of  the  field  consist¬ 
ing  of  a  uniform  distribution  of  the  e(/).  Examination  of  Eq.  (3.9)  for  specific  array  configurations 
shows  it  to  be  essentially  constant  over  k,  whereas  Eq.  (3.7)  can  vary  considerably  as  a  function  of  k. 


If  we  have  some  prior  information  about  the  relative  power  in  signals  and  noise,  we  could  modify 
the  above  procedure  and  instead 

minimize  p^b*EE*b  +  b*Nb,  subject  to  b*  e{k)  —  1.  (3.10) 

This  leads  to  a  generalized  version  of  £3: 

£3(k)  -  ah(k)*(fi^E£^  +  N)~'  R  {fi^EE^  +  N)"*  c(k),  (3.11) 

with  a}  —  1  e{k)^{fi^EE*  -t-  N)~'  e{k).  Note  that  even  when  the  noise  is  assumed  to  be  uncorrelated 
(N  —  /),  the  £|  estimator  will  not  be  an  optimal  sidelobe  suppression  estimator  unless  EE*'  —  /  as 
well.  The  matrix  EE*  involves  the  geometry  of  the  array  and  the  chosen  set  of  /.  In  fact  EE*  may  be 
identical  to  a  matrix  one  might  use  to  model  ambient  noise;  it  might  be  a  sine  matrix,  for  example. 
This  is  because  one  way  to  model  ambient  noise  is  by  imagining  a  large  number  of  independent  far- 
field  sources  uniformly  distributed  in  space  or  in  a  plane  (Si. 


CHARLES  L,  BYRNE 


In  this  section  we  have  improved  upon  the  P-i  estimator  by  including  sidelobe  suppression  as  a 
constraint.  A  problem  still  remains,  however.  Because  we  do  not  yet  know  which  signals  are  present, 
we  do  not  know  which  sidelobes  (which  /)  are  troublesome.  Sidelobes  are  a  problem  only  when  some¬ 
thing  is  out  there  in  that  direction.  In  the  next  section  of  this  report  we  look  at  how  we  might  use  prior 
information  to  suppress  primarily  those  sidelobes  that  pose  a  problem. 

4.  INCORPORATING  PRIOR  INFORMATION 

In  the  previous  section,  we  derived  the  estimator  (3.1 1)  by  including  in  that  part  of  the  data  to 
be  filtered  out  all  signal  components,  corresponding  to  /  different  from  k,  that  might  be  present.  If  we 
have  prior  knowledge  of  the  locations  and  relative  strengths  of  actual  signals  we  can  use  this  to  modify 
our  estimator.  The  resultant  procedure  is  related  to  the  optimal  Bayesian  solution  for  Eq.  (1.6)  [11. 

Suppose  that  we  have  a  prior  estimate,  of  the  signal  correlation  matrix  Ag.  For  fixed  k  we 
seek  filter  b  to  minimize 

b^(EAE*  +  N)b,  subject  to  b*e(k)  =  1  .  (4.1) 

The  desired  b  is 

b  -  XiEAET-  +  N)-'eik),  X  -  X(A:)  -  l/e(X)+  (EAE*  +  A)"'  e(k) .  (4.2) 

The  filter  output  is 

y^(k)  ~  k(k)  e(k)-*-  (EAE*  +  N)-'x,  (4.3) 

and  the  averaged  squared  output  is  our  estimator, 

P4  (^)  -  k(k)^  e(k)*  (EAE*  -1-  N)-'  R  (EAE*  +  N)-'  e(k).  (4.4) 

The  output  y4(X),  viewed  as  an  estimate  of  a(k),  is  closely  related  to  the  Bayesian  solution  of  Eq. 
(1.6),  as  we  now  show. 

With  the  assumption  that  a  and  n  are  independent  complex  Gaussian  random  vectors  (see 
Appendix  B)  with  mean  zero  and  covariance  matrices  Ag  and  Ng,  respectively,  the  probability  density  of 
Jr,  given  a,  is 

P(x|o)~  exp  (  -  (jr  -  Ea)^  (p^Afo)“'(jr  -  fa)),  (4.5) 

where  ~  means  "is  proportional  to."  The  probability  density  of  a  itself  is 

p(a)~  exp  (-  a'*'(cr^Ag)~'a),  (4.6) 

so  that  the  probability  density  of  a  given  x,  is 

P(alx)-~  p(xla)p(a) ,  (4.7) 

or 

p(a|x)~  exp  I  -  (jr  -  fa)"^  (p^No)~'  (x  -  Ea)  -  a'''(<7^  i4o)~'aj.  (4.8) 

The  Bayesian  approach  tells  us  to  .select  as  our  estimate  of  a  the  expected  value  of  Eq.  (4.8).  But  the 
expected  value  in  this  case  is  also  the  value  of  a  for  which  Eq.  (4.8)  is  largest.  Consequently  the  Bayes 
estimate  is  a,  where  a  minimizes 

G(a)  -  (x  -  fa)'*'  (p^  ~  Ea)  -E  (a-^Ao)~^a.  (4.9) 

From  vector  calculus  it  follows  that 

a  "  (t^AqE^  (o-^  EAgE*  +  p^Ng)~^x. 


(4.10) 
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Consider  for  a  moment  the  case  in  which  Aq  is  diagonal,  Aq  =  diag  (/4o(/,/)  lall  /}.  Then  the  A^th  entry 
of  Eq.  (4.10)  becomes 

a(k)  =  Ag  (/c,/c)  e(k)*  (tr^  £Ag  +  p^No)~^x.  (4.11) 

Here  the  statistics  are  known  {a-^Ag  and  p^Ng  assumed  known)  and  are  being  used  to  estimate  an  indi¬ 
vidual  a.  In  Eq.  (4.3),  A  and  N  are  estimates  of  a^Ag  and  p^Ng.  Also  k(k)  is  an  estimate  of 
cT^Ag{k,k),  which  is  the  true  signal  power  in  the  k  direction.  The  expression  for  \(/c)  in  Eq.  (4.2)  will 
be  encountered  again  in  Section  5  of  this  report;  it  is  the  Capon,  or  maximum  likelihood  (ML)  estima¬ 
tor  of  k  directional  signal  power,  given  EAE^  +  N  as  the  cross-sensor  correlation  matrix.  So  kik)  is  a 
prior  estimate  of  signal  power,  based  on  our  prior  estimate  of  Rg. 


•  4 

'  -  •  ••>  'j 
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Returning  to  the  general  case,  in  which  /4o  is  not  necessarily  diagonal,  it  is  interesting  to  note  that 
the  Bayes  estimator  Eq.  (4.10)  does  not  have  the  same  form  as  (4.3).  The  kih  entry  of  a  can  be  writ¬ 
ten 

a(k)~<T^  "^AgikJ)  EAgE^ +p^No)-'x,  (4.12) 

which  suggests  that  when  signals  are  correlated,  our  constraint  b'*'  e(k)  =  I  is  not  the  best  thing  to  use. 
This  makes  sense;  up  to  now  our  approach  has  been  to  hold  k  fixed,  find  b(k),  derive  the  estimates 
for  a(k)  and  <|a(<r)P>,  and  then  move  to  another  k.  We  do  not  require  the  estimate  >’(/t)  or  Pik) 
to  have  any  particular  global  properties,  as  functions  of  k.  But  when  the  signals  are  correlated  what 
happens  at  one  k  is  related  to  what  happens  elsewhere;  this  has  not  been  taken  into  account  in  the 
development  so  far. 

Based  on  the  Bayesian  solution  in  the  general  case,  we  can  modify  estimators  >"4  and  as  follows: 
yU  (k)  -  {j^Aik^De  (/)*|(£.4£^  -b  N)-'x, 


PU(k)  ^  <  \y\(k)\^>.  (4.13) 

Unless  we  have  sufficient  prior  information  to  warrant  using  a  nondiagonal  A  as  our  estimate  of  Ag, 
these  modified  estimators  reduce  to  those  of  Eqs.  (4.3)  and  (4.4). 

As  previously  mentioned,  the  k(k)  used  in  Eq.  (4.3)  is  a  prior  estimate  of  cr^Ag{k,k).  Suppose 
A  -  diag  {>4  (/,/))  and  we  consider  the  signals-only  case.  Then  k{k)  =  l/eik)*  {EAE^)~^e(k).  With 
L  —  diag  (x(/)),  is  it  true  that  L  —  A,  that  is,  are  our  initial  estimates  of  the  signal  powers  the  same  as 
theX(X)?  No,  they  are  different,  generally.  Writing 

1/X(X)  -  e(X)+ (£4£+)-' e(X),  (4.14) 

we  see  that  the  diagonal  entries  of  the  matrix  E* {EAE^)~' E  are  1/X(/),  so  that  the  diagonal  entries  of 
AE*(EAE^)~^ E  are  A  U, Dikii).  But  the  trace  is  the  sum  of  the  diagonal  entires,  so 

\x^CQiAE*iEAE^)-'E)  -  Z  A{l,D/kiD.  (4.15) 

I 

But  trace  (/l£+(£>l£+)-'£)  -  trace  iEAE^)-'  EAE*)  -  M  so 

A/-  £/4(/,/)/x(/).  (4.16) 

I 

If  there  are  more  than  M  nonzero  A  (/,/)  in  the  summation,  then  not  all  the  terms  can  equal  1.  We 
shall  return  to  this  point  in  our  discussion  of  Capon’s  method  in  Section  5. 

The  linear  methods  presented  in  the  first  four  sections  of  this  report  provide  estimates  of  the 
power  in  the  various  directions  for  the  components  that  are  passed  by  the  filters:  the  various  Pik)  are 
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considered  to  be  estimators  of  <|a(/c)P>,  so  that  when  there  is  no  signal  in  the  k  direction  P(k) 
should  be  nearly  zero.  The  problem  then  is  to  constrain  the  filters  bik)  properly.  Later  in  this  report 
we  consider  nonlinear  methods.  It  is  not  correct  to  view  these  estimators  as  giving  power:  they  give 
only  indications,  through  their  peaks,  of  signal  presence.  Once  we  have  used  these  methods  to  deter¬ 
mine  y,  the  number  of  actual  sources,  and  their  corresponding  k  we  can  include  these  quantities  as 
prior  information  in  a  Bayesian  approach  to  estimating  Aq.  We  can,  for  example,  take  A  (k,k)  =  I  if  k 
corresponds  to  an  actual  source,  according  to  our  best  estimates,  and  0  otherwise.  Then  rank  (EAE*) 
is  J.  Our  estimate  of  a(k)  is  then  y^ik)  in  Eq.  (4.3).  To  estimate  Ao(kJ)  we  can  use 
<y'4(k)>'4(/)  > ,  where  k  and  /  are  limited  to  those  values  already  deemed  to  correspond  to  actual 
sources. 

5.  DATA  ADAPTIVE  FILTERING  AND  CAPON’S  METHOD 

We  saw,  in  our  discussion  of  the  Bayesian  method  and  its  relation  to  the  estimators  in  the  last 
section,  that  the  matrix  EAE  +  N  that  appears  in  Eqs.  (4.3)  and  (4.4)  is  a  prior  estimate  of  the  matrix 
ct^EAqE"^  +p'^Nq,  which  is  our  model  for  the  true  cross-sensor  correlation  matrix,  Rq.  Capon’s  idea  is 
to  dispense  with  prior  estimates  of  Rq  and  to  use  instead  the  actual  measured  R. 

Holding  k  fixed,  we  find  that  the  desired  filter  b  is  now 

b  =  \(k)R-^e(k)Mk)  =  \le(k)*R-^e(k),  (5.1) 

the  filter  output  is 

y^ik)  =  b'*^  X  =  k(k)e(k)'^  R~^x  (5.2) 

and  the  averaged  squared  output  power  is  the  estimator, 

Piik)  -  \(k)h{kyR-^RR-'e(k)  =  M eik)* R-'eik)  =  k(k).  (5.3) 

This  power  estimate  is  Capon’s  maximum  likelihood  (ML)  estimator.  Note  this  estimator  is  not  linear, 
that  is  although  y^ik)  =  b^x  appears  to  be  a  linear  function  of  x  there  is  a  hidden  occurrence  of  x  in 
the  b^  term,  because  b  depends  now  on  R ,  which  depends  on  x.  For  this  reason,  ML  is  called  a  non¬ 
linear  estimator.  It  has  also  been  referred  to  as  a  data  adaptive  procedure,  because  the  filter  adapts  not 
to  any  prior  estimate  of  what  is  out  there,  but  to  the  actual  data  contained  in  /?.  It  is  helpful  to  note 
that  the  filtering  scenario,  whereby  x  is  received  and  filtered  by  b  to  produce  y(k),  cannot  now  occur 
in  time,  because  of  the  dependence  of  b  on  the  entire  collection  of  data  vectors  x.  The  averaged  mag¬ 
nitude  squared  of  the  filter  output  is  no  longer  power  because  of  the  dual  dependence  on  x. 

How  well  does  P^ik)  estimate  power  in  the  kth  signal  component?  To  see,  let  us  imagine  a 
noise-free  case,  with  uncorrelated  signals,  R  —  a-^EAoE^,  Ao“  diag  (Agi/,/)).  Recalling  our  discussion 
concerning  L  and  A  at  the  end  of  the  previous  chapter,  we  know  that  we  do  not  have 
P^fk)  =  cr^Ag(k,k)  for  each  k,  generally.  Because  the  sum  of  ar^Ag(k,k)/Psik)  over  all  k  must  equal 
M  Eq.  (4.16),  the  P^ik)  can  equal  the  (r^Ao(k,k)  only  if  there  are  precisely  M  nonzero  entries  on  the 
diagonal  of  Ag.  If  there  are  fewer  than  M  actual  signals,  then  P^ik)  must  underestimate  some  of  the 
a-^Ag(k,k),  while  if  there  are  more  than  M  signals,  P^  (k)  will  overestimate  some  of  their  powers.  In 
most  practical  cases  Ng  is  present  as  well,  giving  the  appearance  of  many  smaller  signals  distributed  in 
various  directions,  so  typically  P^(k)  overestimates  the  (T^Ag(k,k).  Part  of  what  it  sees  as  signal  is 
noise  power  in  that  direction. 

Nonlinear  methods  have  been  studied  carefully  since  the  late  1960s  when  it  was  observed  that 
Capon's  ML  and  Burg’s  maximum  entropy  method  (ME),  to  be  discussed  later,  were  able  to  resolve 
closely  spaced  sources  that  linear  methods  could  not  (7).  There  are  many  different  derivations  of  these 
nonlinear  methods  that  shed  some  light  on  their  superior  resolving  capability,  but  by  far  the  most 
revealing  approach  to  take  is  the  analysis  of  the  generalized  eigenvector  structure  of  R  [8].  Not  only 
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does  this  route  provide  a  clear  understanding  of  why  nonlinear  methods  tend  to  resolve  better,  it  helps 
us  see  what  can  go  wrong  and  suggests  safer  alternatives. 

Assume,  from  now  on,  that  there  are  onij  values  of  k  for  which  A^ik.k)  >  0.  Let 

these  values  of  k  be  indexed  kj  j  =  1,  ...  ,/,  and  let  eO)  =  eikj),  aij)  =  a(kj).  Let  Bq  be  the  J  by 
J  matrix  obtained  from  Aq  by  removing  all  rows  and  columns  whose  main  diagonal  entry  is  zero.  Let 
F  be  the  M  by  J  matrix  whose  7th  column  is  e(J).  Then  Rq  can  be  written  as 

Ra=  (5.4) 

To  avoid  technical  difficulties  we  also  assume  that  the  rank  of  Bq  is  V,  that  is  although  signals  may  be 
mutually  correlated,  there  is  no  complete  correlation  that  would  reduce  the  rank  of  Bq. 

The  generalized  eigenvectors  of  the  pair  (Rq  Nq)  are  the  vectors  =  1 . A/,  that  solve  the 

following  constrained  maximization  problem; 

maximize  z'^/?oZ,  subject  to  constraints  =  1  and  z^z„  =  0,1  ^  n<m.  (5.5) 

The  values 


are  called  the  generalized  eigenvalues  corresponding  to  the  pair  (Rq  Nq).  note  that  t'„  >  c„  +  i  >  for 
dll  m.  The  z,„  can  also  be  defined  as  the  solution  to  the  minimization  problem; 

2'  R:  subject  to  the  constraints  z’^A,,  z  =  1,  z"’z,  =  O.m  <  n  <  M.  (5.7) 

Because  rank  (Bq)  =  J  <  M.  it  follows  that  for  m  >  7  +  1  and  z^  (FBqF^’z„  =  0. 

Writing  for  ni  J  +  \. 

0  =  z:(FBQr)z„  =  (rz„)"5c  iF^zJ  ;  (5.8) 

It  I'ollows  from  rank  {Bq)  =  J  that  F^z„  =  O.m  ^7+1.  Therefore,  for  each  7  =  1,  . . . ,  7. 

c(7)^'m  =  0.  (5.9) 

If  the  model  Eq.  (5.5)  is  exactly  equal  to  R.  and  we  know  Nq  exactly,  we  can  calculate  the  c„  and  z„. 
We  determine  7  and  p^  from  the  multiplicity  and  value  of  the  lowest  generalized  eigenvalue,  and  the 
value  k,  from  the  zeros  of  the  functions  eik)*z„  m^J  +  1.  How  would  we  calculate  these  generalized 
eigenvectors? 

The  generalized  eigenvectors  for  the  pair  {Rq,!^q)  can  be  found  by  calculating  ordinary  eigenvec¬ 
tors  of  another  matrix.  Let  Nq=  and  set  T  =  (L'’^)'*  Rq  Let  m  =  ,  M  be 

orthonormal  eigenvectors  and  eigenvalues  of  T  with  >  X„+i  for  all  m.  Then  z„  =  u„  and 
c„  =  A„,  for  all  m. 

In  practice  we  do  not  have  Rq  and  Nq  but  R  and  N.  To  avoid  cumbersome  notation  let  us  use 
z„,  c„  for  the  pair  (R,N),  keeping  in  mind  that  N  only  approximates  p^Nq,  and  that  the  computed 
m  ^  J  F  1,  will  not  be  precisely  identical.  Let  us  refer  to  the  z„,  m  ^  7  -b  1,  as  the  /ow general¬ 
ized  eigenvectors  (LGE)  and  the  others  as  high  (HGE).  Similarly,  the  m  >  7  -b  1,  are  the  low 
eigenvectors  of  T  (LE),  the  others  the  high  ones  (HE).  We  now  return  to  Capon’s  estimator  and 
analyze  it  from  the  standpoint  of  the  generalized  eigenvectors  of  {R,N). 

Set  C  =  diag(c„,  m  =  1,  . . .  ,  A/)  (recall  c„  =  X„),  and  let  U  be  the  M  by  M  matrix  whose  mth 
column  is  u„.  Then  UCU^  =-  T  and  UU*  —  /.  Therefore 
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and 

/?->=  K-'7^‘(r^)“‘ =  (^^‘^/)C-‘(^'“'^/)•' =  ZC-'Z+,  (5.11) 

where  Z  is  the  M  by  M  matrix  whose  mth  column  is  z„.  Note  if  N^l,  then  the  decomposition 
/?“'  *=  ZC~^Z*  is  not  the  usual  eigenvector  decomposition  of  in  particular,  the  z„  are  not  orthog¬ 
onal  (ZZ+  ^  I). 

From  Eq.  (5.11)  we  can  write 

R-'  =  i  c-'  (5.12) 

m— I 

so  that  Ps(k)  in  Eq.  (5.3)  becomes 

Psik)  =  ^  c-'WikY  zj\  (5.13) 

We  know  from  Eq.  (5.9)  that  0  =  e(j)*z„  for  the  LGE  so  that  if  we  did  not  have  the  HGE  terms  in 
Eq.  (5.13)  Ps(kj)  would  be  +  <».  Because  c„  ^  c„+|  the  reciprocal  weighting  of  terms  by  c~' 
emphasizes  the  desired  LGE  terms,  and  more  so  as  the  S/N  increases. 

The  next  step,  then,  in  improving  the  performance  of  our  estimators,  is  to  reduce  the  weighting 
given  to  the  HGE  terms  in  Eq.  (5.13).  We  can,  of  course,  simply  calculate  the  z„  (assuming  we  know 
No  and  the  model  is  accurate)  and  keep  the  terms  we  want.  This  is  sometimes  done  and  we  discuss  it 
in  Section  7  of  this  report.  First,  however,  we  consider  further  improvement  through  constraint  mod¬ 
ification  because  this  approach  leads  us  to  several  other  nonlinear  methods  that  have  appeared  in  the 
literature,  including  Burg’s  maximum  entropy  method  (ME)  [9],  Johnson’s  linear  predictive  (LP) 
estimator  [2],  and  the  Byrne-Fitzgerald  extensions  of  ME  (10,111. 

6.  INCREASING  LEAKAGE  SUPPRESSION 

Each  of  the  estimators  we  have  discussed  so  far  performs  well  when  there  is  a  signal  in  the  direc¬ 
tion  of  look  k,  because  of  the  constraint  b^eik)  =  1.  However,  when  there  is  no  signal  in  direction  k, 
but  signals  nearby,  there  can  be  a  high  output  in  the  k  direction  due  to  leakage  from  these  nearby 
sources.  Two  signals  close  together  can  cause  the  estimator  to  give  a  higher  reading  at  values  of  k 
between  the  two  signals  than  it  gives  at  either  of  the  two  themselves,  causing  loss  of  resolution.  To 
improve  resolution  we  must  increase  leakage  suppression.  This  is  related  to  decreasing  the  weighting  of 
the  HGE  terms  in  Eq.  (5.13),  as  we  shall  see  shortly. 

The  LGE  are  orthogonal  to  each  of  the  signal  vectors  e(/),  y  =  1,  ....  7,  so  the  span  of  these 
e(j)  must  coincide  with  the  span  of  the  HGE  {span  meaning  the  collection  of  all  linear  combinations  of 
the  indicated  set  of  .ectors).  Therefore  increased  emphasis  on  signal  suppression  is  equivalent  to 
reducing  the  HGE  weighting  in  Eq.  (5.13).  Writing  T  =  UCU*,  T”  =  UCU^,  n  —  1,  2,  . . .  ,  we  see 
that  by  raising  T  to  a  power,  we  increase  the  relative  size  of  c~'  for  m  ^  J  +  I,  compared  to  m  <  7. 
Suppose  we  replace  /?  =  with 

^  (n)  ^  y+  j'n  y 

in  the  P^  estimator.  From  the  equation 

(/{(/.))-!  _  2(r"  Z+, 

we  see  that  we  obtain 

Pj'-’  (k)  -  l/x  c-"|e(fc)+  z„P,  (6.3) 

/  (FI-I 


(6.1) 

(6.2) 


10 
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and  that  as  n  approaches  +oo  Eq.  (6.3)  approaches  the  explicit  generalized  eigenvector  solution, 

Pr(k)~l/  £  (6.4) 

/  m— y+1 

We  shall  return  to  Eq.  (6.4)  in  the  next  section  of  this  report.  Now  we  want  to  consider  some  special 
cases  of  Eq.  (6.3),  particularly  for  n  —  2. 

Using  n  =  2  \n  Eq.  (6.3),  we  obtain 

(k)  -  l/e(kyR-'NR-'e(k).  (6.5) 

If  we  replace  N  with  a  dyad,  pp'*^,  we  obtain  an  approximation  of  Eq.  (6.5)  that  involves  less  computa¬ 
tion; 

Pl,(k)~l/\e{kyR-'p\l  (6.6) 

If  the  noise  N  is  highly  correlated,  then  p  could  be,  for  example,  the  largest  eigenvector  of  N.  If  we 
have  a  uniform  line  array  and  the  noise  is  stationary,  so  that  N  is  Toeplitz  (constant  on  each  diagonal), 
then  N  is  determined  by  its  first  column,  which  could  be  taken  to  be  p.  In  any  case,  from  the  point  of 
view  we  have  adopted,  these  estimators  appear  as  approximations  only.  In  some  cases,  however,  Eq. 
(6.6)  can  be  derived  as  an  optimal  estimator  in  its  own  right.  If  we  take  p  to  be  the  first  column  of  the 
matrix  /,  then  we  get  either  Burg’s  ME  (if  R  is  Toeplitz)  or  Johnson’s  LP  method  (general  R).  If  p  is 
the  first  column  of  any  positive-definite  Toeplitz  matrix  (not  necessarily  /),  then  Eq.  (6.6)  is  (essen¬ 
tially)  the  Byrne-Fitzgerald  extension  of  ME,  called  the  weighted  reciprocal  spectrum  approximation 
(WRSA)  [11]. 

It  has  been  noted  in  the  literature  that  ME  has  resolving  capability  that  is  superior  to  ML  [7]  and 
also  that  ME  can  give  misleading  results  when  the  noise  is  correlated  [12].  Both  of  these  observations 
are  supported  by  our  analysis.  Because  ME  is  an  approximation  of  Eq.  (6.5)  it  has  the  added  resolving 
power  that  comes  from  n  -  2,  instead  of  n  —  1  for  ML.  If  the  noise  is  correlated,  using  the  first  row 
of  /  as  p  is  a  poor  approximation  of  N  by  dyad  pp*\  it  is  better  to  use  the  first  row  of  N  or  its  largest 
eigenvector.  Hence  the  misleading  results  when  N  ^  I. 

Section  7  considers  the  explicit  generalized  eigenvector  estimators.  Because,  in  practice  there  is 
model  error,  we  must  consider  as  well  the  stability  of  these  estimators. 

7.  EXPLICIT  GENERALIZED  EIGENVECTOR  METHODS 

As  previously  noted,  higher  resolution  is  obtainable  if  we  reduce  the  output  power  of  our  filters 
t>(.k)  for  those  k  that  do  not  correspond  to  actual  signals.  Let  k  be  such  a  vector.  In  order  that 
b  —  b(k)  reject  entirely  the  signal  component,  it  is  necessary  (and  sufficient)  that  6  be  a  linear  combi¬ 
nation  of  the  LGE. 


Let  q  be  an  arbitrary  member  of  the  span  of  the  LGE;  that  is. 


9  “  ^m^m' 

(7.1) 

Let  b  \q  for  some  X.  The  usual  constraint  b'^eik)  —  1  tells  us  that 

X  -  X(X)  -  V  £ 

(7.2) 

/  W—y+I 

which  is  finite  because  k  is  not  one  of  the  kj.  The  filter  output  is  then 

y^{k)  -  \  {k)q^x. 

(7.3) 

II 
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and  the  averaged  magnitude  squared  output  is 

P^{k)  =  |x(Ar)P(7'^/?(7  =  q'^Nq!  ^  d„e(k)'*'z„  ^  (7.4) 

m— 7+1 

This  estimator  is  finite  unless  k  =  kj  for  some  y,  in  which  case  it  is  infinite. 

It  is  important  to  reiterate  the  assumptions  on  which  Eq.  (7.4)  is  based.  We  assume  that  R  =  /?o. 
that  N  =  No,  that  there  are  J(<M)  signals  present  and  that  Bq  has  rank  J .  In  practice  these  assump¬ 
tions  can  be  violated  in  numerous  ways:  the  signal  vectors  e(k)  are  not  precisely  planewaves;  the 
averaging  time  for  R  is  not  long  enough  to  eliminate  cross  terms  between  signals  and  noise,  so  tnat 
R  ^  Rq,  the  noise  matrix  Nq  is  not  precisely  known.  Because  the  information  we  need  is  carried  by 
the  nulls  of  the  functions  e(k)'*^z„  it  is  easily  perturbed.  The  main  problem  with  P^  is  robustness;  that 
is,  stability  in  the  presence  of  model  errors. 

One  particular  source  of  perturbation  is  phase  error,  which  can  be  systematic  (in  the  electronics) 
or  the  result  of  wavefront  curvature  or  array  motion.  These  perturbations  cause  severe  degradation  of 
most  nonlinear  methods  when  the  noise  is  correlated. 

Phase  errors  can  be  modelled  by  taking 

D  =  diag  {exp  m  =  1,  . . .  ,A/1.  (7.5) 

with  {</)„}  M  independent  random  variables,  uniformly  distributed  on  the  interval  [-€,€),  for  some 
small  €  >  0.  We  then  replace  R  with  R  =  DRD*  in  all  the  estimators. 

When  No=  I,  phase  errors  are  not  a  serious  problem  for  nonlinear  estimators,  so  long  as  e  is 
moderate-sized.  But  as  the  noise  becomes  correlated  even  small  values  of  e  (corresponding  to,  say, 
±3°  at  broadside)  can  cause  severe  degradation,  even  for  high  S/N.  In  Section  8  of  this  report,  we 
consider  why  this  is  the  case  and  what  can  be  done  about  it. 

8.  THE  PROBLEM  OF  ROBUSTNESS 

Provided  all  our  assumptions  are  exactly  met,  the  generalized  eigenvector  methods  Eq.  (7.4)  give 
perfect  source  localization  and  resolution,  regardless  of  S/N  level.  In  practice,  instabilities  are  encoun¬ 
tered  so  frequently  that  considerable  concern  has  been  voiced  about  the  effectiveness  of  nonlinear 
methods  in  actual  situations  [13,14).  Much  attention  is  being  given  to  understanding  the  causes  of  in¬ 
stability  and  to  developing  ways  to  combat  it.  In  this  section  we  analyze  one  form  of  perturbation  that 
can  lead  to  severe  estimator  degradation,  phase  errors  with  correlated  noise,  and  discuss  some  recently 
developed  methods  for  stabilizing  in  the  presence  of  such  perturbations.  Our  analysis  of  the  instability 
reveals  that  the  information  we  seek  is  redundantly  stored,  that  most  often  only  partial  loss  occurs,  but 
that  most  methods  interrogate  mainly  the  most  unstable  storage  locations.  The  instability  is  caused  by 
the  sort  of  noise  fields  commonly  encountered  in  sonar  array  processing  and  is  not  directly  related  to 
the  particular  perturbing  effect,  in  this  case  phase  errors,  that  triggers  the  degradation.  The  following 
discussion  summarizes  Refs.  15  and  16. 

If  we  write  R  =  XLX'*’,  where  the  columns  of  X  are  orthonormal  eigenvectors  of  R  and  I  - 
diag  X|  . . .  ,X^|  ,  the  ordered  eigenvalues  of  R,  then  Capon’s  ML  estimator  becomes 

P5(k)=\/z^-'\e(kVxJ\  (8.1) 

with  x„  the  mth  column  of  X.  Clearly  those  terms  for  which  X„  is  smallest  are  given  the  most  weight. 
As  the  noise  becomes  correlated  between  sensors  (Nq  not  diagonal),  the  x„  for  m  near  M  begin  to 
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behave  somewhat  like  the  lowest  eigenvectors  of  the  noise-only  matrix,  Nq,  and  the  low  K„  begin  to 
separate  into  two  groups,  those  in  one  much  smaller  than  those  in  the  other  [61.  If  the  noise  is  isotro¬ 
pic  (spherical)  and  the  array  is  a  uniform  line  array,  then  A'o  is  as  in  Eq.  (2.8).  If  the  spacing  is  much 
less  than  1/2  wavelength,  the  lowest  eigenvectors  of  Nq  are  nearly  orthogonal  to  all  vectors  e(.k) 
corresponding  to  acoustically  feasible  planewave  arrivals.  Figure  8.1  illustrates  what  happens. 


Fig.  8.1  —  Estimator  response  using 
eigenvectors  13  to  25;  1  signal 


In  each  of  the  first  four  figures  we  used  a  simulated  uniform  line  array  with  A/  =  25  sensors  and 
a  spacing  of  1/4  wavelength  (so  twice  oversampled).  The  noise  is  spherically  isotropic.  In  Fig.  8.1  we 
display  the  bearing  response  function 

xj^  (8.2) 

/  m-l 

for  /  =  13,  L  =  25,  with  a  single  planewave  source  at  broadside.  In  each  case  the  top  curve 
corresponds  to  the  case  of  no  phase  errors,  while  the  five  other  curves  involve  independent  simulations 
of  random  phase  errors  of  at  most  ±  5°. 

Figure  8.2  shows  Eq.  (8.2)  in  the  same  cases,  but  with  /  -  2,  L  -  12.  The  information  about  the 
source  is  still  preserved  in  the  (almost)  nulls  of  the  e(k)*x„,  /n  -  2,  . . .  ,  25,  even  though  the  noise  is 
not  white.  However,  for  m  ^  13  these  (almost)  nulls  are  not  much  different  from  neighboring  values 
and  the  relative  sizes  are  easily  disturbed  when  phase  errors  are  introduced.  Those  for  m  —  2,  . . .  ,  12, 
on  the  other  hand,  are  much  smaller  than  their  neighbors  and  the  relative  values  are  not  disturbed, 
hence  Fig.  8.2. 

Figures  8.3  and  8.4  show  the  effect  on  resolution.  Replacing  the  single  source  at  broadside  with 
two  uncorrelated  ones,  at  ±0.00757r/A,  the  smallest  eigenvector  of  /?  (/  -  Z,  -  25  in  (8.2))  fails  to 
resolve  even  without  phase  errors  (because  of  correlated  noise)  (Fig.  8.3),  while  the  choice  of 
/  -  8,  £  —  1 1  (Fig.  8.4)  provides  stable  resolution. 


WAVE-NUMBER  (UNITS  OF  PI/DI 

Fig.  8.4  —  Estimator  response  using 
eigenvectors  8-11;  2  sigtuls 


If  we  wish  to  avoid  the  actual  calculation  of  the  individual  eigenvectors  of  R  and  prefer  an  esti¬ 
mator  with  the  computational  cost  of  ML,  we  can  still  shift  emphasis  to  the  more  stable  eigenvectors, 
as  we  now  show.  The  estimator 

Ptik)  -  1/e 1/2(1?  +  R-'/hik)  (8.3) 

can  be  written  as 

where  JIT  —  0,1,2,  .. .  and  o  are  parameters  chosen  by  the  user.  By  normalizing  R  to  have  trace  1,  we 
make  1  the  average  of  the  eigenvalues;  taking  a  near  1  then  emphasizes  those  in  the  middle.  It  is 
important  that  a  be  greater  than  the  lowest  eigenvalues  corresponding  to  unstable  eigenvectors,  but  not 
as  large  as  Ay,  where  J  is  the  number  of  signals.  As  the  noise  becomes  increasingly  correlated,  the 
number  of  stable  eigenvectors  decreases;  by  raising  K  we  further  emphasize  just  those  eigenvalues  near 
a. 

Figures  8.5  through  8.8  simulate  a  uniform  line  array  of  Af  —  25  sensors,  oversampled  a  factor 
of  5  (A  is  one-tenth  of  a  wavelength).  Isotropic  noise  is  30  dB  above  the  uncorrelated  noise  and  there 
are  two  signals,  at  (0.1  ±0.03)ir/A,  each  with  power  —3  dB  relative  to  the  total  noise  power.  Figure 
8.5  shows  (top  curve)  the  conventional  estimator,  with  and  without  phase  errors  (no  difference),  below 
it  the  ML  without  phase  errors,  and  below  that  ML  for  the  same  five  independent  sets  of  phase  errors 
used  before. 

Figure  8.6  shows  estimator  Ptik)  for  the  same  data;  from  top  to  bottom  we  have  AT  —  1,  4,  and 
12.  Here  —  6.0  and  each  of  the  three  graphs  is  actually  the  five  phase  error  cases  superimposed  on 
the  errorless  case,  showing  almost  perfect  stability.  Remembering  that  AT  -  0  gives  ML  and  comparing 
Fig.  8.5  with  AT  —  I  in  Fig.  8.6  we  see  how  quickly  stability  can  be  achieved  as  we  iterate  beyond  ML. 

Figure  8.7  shows  that  the  estimator  Ptik)  is  not  overly  sensitive  to  the  choice  of  a\  here  we 
have  the  same  case  as  in  Fig.  8.6,  with  AT  —  6  and  —  6.0,  1.0,  and  0.5. 
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When  the  signal  arrivals  are  correlated  most  nonlinear  methods  degrade,  either  not  resolving  or 
giving  biased  estimates  of  arrival  angles  (17).  In  Fig.  8.8  we  have  two  signals  with  correlation  coeffi¬ 
cient  of  (0.9,  0°);  shown  are  ML,  with  and  without  phase  errors  (top  curves)  and  Pgik),  K  -  12, 
—  1.0  for  the  same  cases  (superimposed  lower  curves).  As  the  signals  become  more  correlated,  the 
second  largest  eigenvalue  approaches  those  of  the  noise  (in  the  uncorrelated  noise  case)  and  careful 
selection  of  a  becomes  more  crucial. 

The  real  point  of  these  findings  is  not  the  particular  methods  themselves,  but  the  fact  that  the 
desired  information  is  still  there  in  these  highly  perturbed  cases.  Once  we  know  that  the  information  is 
more  robustly  stored  in  the  higher  eigenvectors  (the  ones  immediately  below  m  —  /),  provided  the 
noise  is  not  too  correlated,  we  can  extract  that  information  in  several  ways.  The  estimators  P^ik)  are 
just  suggested  ways  of  doing  this. 

9.  COMMENTS  AND  CONCLUSIONS 

We  have  derived  a  number  of  well-known  array  processing  methods  by  using  the  notion  of  linear 
filter  to  provide  a  unifying  approach.  Certainly  there  are  other  ways  to  derive  each  of  the  methods;  see 
Ref.  18,  for  example,  where  the  optimal  noise  suppression  method  is  derived  in  three  different  ways. 
There  are  other  methods  which  we  did  not  discuss  (see  Ref.  19,  e.g.),  but  which  can  be  understood  in 
terms  of  filtering,  eigenvector  analysis  and  the  other  tools  we  have  used  in  this  report. 

The  last  word  has  not  yet  been  written  on  the  subject.  Each  application  brings  with  it  its  own 
peculiar  requirements.  Two  directions  of  great  interest  at  the  moment,  for  sonar  array  processing,  are 
statistical  analyses  of  the  performance  of  the  various  methods  [20]  and  the  use  of  more  complicated 
physical  models  to  guide  the  signal  processing  [21].  Discussion  of  either  of  these  topics  in  any  detail 
would  have  taken  us  too  far  afield,  however. 
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Appendix  A 

THE  FOURIER  TRANSFORM  AND  ITS  APPROXIMATION 

The  Fourier  transform  (FT)  of  the  function  /  (t)  is  defined  to  be  the  function 

oe 

Fiat)  -  f  /(t)  exp  iiatt)dt,  (Al) 

for  —  oo  <;  o)  <  oo.  Given  Fiat)  for  all  at,  we  can  recapture  /  by  using  the  inversion  formula, 

fit)  —  f  Fiat)  exp  (—  iatt)  dw/l-tr.  (A2) 

For  the  integrals  in  Eqs.  (Al)  and  (A2)  to  exist,  it  is  necessary  that  both  /  and  F  approach  zero  suffi¬ 
ciently  rapidly  as  |/|  and  |<u|  approach  <».  However,  it  is  useful  to  speak  of  the  FT  of  periodic  func¬ 
tions  and  for  that  the  notion  of  the  8  function  is  needed;  we  denote  by  8 (to)  the  FT  of  the  function 
fit)  —  l/2rr  for  all  t,  so  that  8(to  —  a)  is  the  FT  of  the  function  fit)  —  (l/2ir)  exp  i~/at).  The  8 
function  8 (to  -  a)  is  usually  depicted  as  a  spike  at  a  and  thought  of  as  a  (generalized)  function  that  is 
zero  except  at  a  but  with  integral  l/27r.  These  generalized  functions  can  be  defined  more  rigorously  by 
using  limits  of  ordinary  functions. 

If  we  have  only  finitely  many  values  of  /,  we  cannot  use  Eq.  (Al)  directly;  to  obtain  F  we  must 
either  model  /  (or  F)  in  some  way  or  settle  for  an  approximation  of  F.  We  consider  two  cases: 

Case  A  (equispaced  data):  the  data  values  are  fimUt),  m  —  1, . . .  ,Af,  for  some  A  >  0; 

Case  B  (general  data):  the  data  values  are  /(t*,),  m  ■»  1, . . .  ,N,  for  some  N  distinct  values 
tm- 

We  begin  with  the  modelling  approach  and  then  consider  approximation. 

The  simplest  sort  of  model  for  /  is  a  linear  combination  of  N  known  functions,  g„,  with  coeffi¬ 
cients  a  in)  to  be  determined  from  the  data; 

fit)-  ^ain)g,it),  (A3) 

with 

fit„)  -  '^ain)  g„  it„),m  (A4) 

•—1 

determining  the  a  in)  uniquely,  so  long  as  we  have  chosen  the  g„  so  that  the  matrix, 

G- [C(m,n)-g,  (i;,)),  (A5) 

is  invertible.  Once  the  a  in)  are  found  the  FT  is 

Fiat)  —  a  in)  G„  iw),  (A6) 

If— I 

with  G,  the  FT  of  g,.  The  problem  with  this  approach  is  that  unless  we  employ  prior  information  to 
select  the  g„  in  an  appropriate  manner,  the  whole  procedure  is  completely  arbitrary.  If  the  model  is 
reasonable,  the  FT  in  Eq.  (A6)  will  be  a  good  estimate;  if  the  model  is  not  accurate  Eq.  (A6)  will  be 
useless.  Some  methods  that  appear  to  be  models  are  actually  optimal  approximations  and  as  such  are 


NRL  REPORT  8959 


less  arbitrary  than  they  might  appear.  In  other  cases,  when  certain  models  are  viewed  as  approxima¬ 
tions  they  are  suboptimal  and  should  be  replaced  by  optimal  ones.  One  particular  model  that  can  also 
be  viewed  as  an  approximation  (although  not  always  an  optimal  one)  is  the  discrete  Fourier  transform 
(DFT). 


For  case  A  the  DFT  model  for  /  is 

,v 

/dft  (')  =  X  (-i(u„t),  (A7) 

1 


with  w„=  -  7r(l  -  2n/N)l^.  The  reason  for  this  particular  choice  of  (o„  is  that  the  matrix  G  =  E, 

E  =  [exp  (i(o„  m  A)],  (A8) 

then  has  E~^  =  (\IN)E^,  +  denoting  conjugate  transpose.  We  can  then  write  the  ain)  in  closed 
form;  for  «  =  1 . N, 

ain)  -  (\/N)  ^  f(m\)  exp  (.i<u„  m  A).  (A9) 

m- 1 

The  FT  estimate  is  then 


E^pj  (w)  =  (2n/N)  ^ 
1 


^  /(mA)  exp  (i(u„  wA) 


m— I 


8  (to  —  to„). 


(AlO) 


The  DFT  model  (in  case  A)  assumes  that  F(to)  is  the  sum  of  N  delta  functions,  at  to  =  to „.  We 
could,  of  course,  select  other  values  for  the  to„;  all  we  would  lose  would  be  the  computational  conve¬ 
nience  of  the  closed  form  solution  Eq.  (A9)  and  the  fact  that  Eq.  (AlO)  can  be  numerically  computed 
rapidly,  using  a  fast  Fourier  transform  (FFT)  algorithm. 

The  expression  on  the  right-hand  side  of  Eq.  (A9)  can  be  thought  of  as  a  function  of  general  a>; 
let 

N 

Hioj)  -  ^ /(/wA)  exp  (/tamA).  (All) 

m— 1 

Then  H((d„)  =  ain)  and  within  the  context  of  the  modelling  approach  Hiu)  has  no  significance  for 
any  w  except  the  We  see,  however,  that  H  does  appear  as  an  optimal  approximation;  this  can 
cause  considerable  confusion  if  the  distinction  between  modelling  and  approximation  is  not  clearly 
made.  A  second  source  of  confusion  comes  from  the  practice  known  as  zero-padding. 

Zero-padding  refers  to  the  practice  of  artificially  enlarging  the  equispaced  data  set  by  appending  a 
number  of  zero  values;  that  is,  for  m  =  N  +  \, . . .  ,N  -E  K  let  fim^)  =  0.  When  the  larger  set  of 
values  is  used  as  data  along  with  the  DFT,  the  number  N  in  Eqs.  (A7)  to  (AlO)  is  replaced  by  N  K 
and  the  definition  of  changes  accordingly.  From  a  modelling  standpoint  we  have  simply  changed 
the  model.  But  the  function  //(&»)  still  is  involved  and  gives  the  values  of  the  new  coefficients  ain)  at 
the  new  points  (u„,  because  the  K  new  terms  added  to  Eq.  (All)  are  all  zero.  Zero-padding  really  only 
makes  sense  within  the  context  of  approximation;  creating  new  data  values,  equal  to  zero,  and  changing 
the  model  seem  pointless  and  arbitrary  from  a  modelling  perspective. 

Note  that  in  Eq.  (AlO)  the  FT  estimate  at  cu  -  is  not  //((■>„);  it  is  2Tr  Hi<u„)  Sim  -  m„).  As 
we  shall  see,  in  approximating,  that  H  itself  is  the  estimate  of  F  (except  for  a  constant  factor). 
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When  we  pass  to  the  general  case  B,  there  is  the  temptation  to  modify  Hioi)  in  Eq.  (All)  to 

J  (oj)  =  ^  fU„)  exp  (/<o/„).  (A12) 

m—  1 

The  question  now  is:  for  which  N  values  of  cu  is  7  to  be  used  to  provide  the  model  coefficients?  There 
is  no  model  for  which  J  provides  the  coefficients,  in  general.  Also,  as  we  shall  see,  J  is  not  an  optimal 
approximation  either,  in  general.  Although  J  is  used  in  practice  in  the  case  B  situations,  it  is  an  esti¬ 
mator  of  F  that  is  devoid  of  mathematical  foundation. 


Turn  now  to  approximation.  Suppose  we  take  an  arbitrary  weighting  function  P(io)  >  0  and 
minimize  the  weighted  mean  squared  error, 

J*  |/'(<a)  —  P(cii)  ^  bin)  exp  i i  u) t„)\^  P~^  (oj)  d(o,  (A13) 

n—  1 

with  the  notational  convention  that  if  Pioj)  =  0  then  P~'i<u)  is  defined  to  be  0  [Al]. 


The  bin)  for  which  the  error  is  least  must  satisfy  the  equations 

fit„)  =  ^  bin)  J  ^  Piat)  exp  (/  (uil„  —  t„))d  cu/27r,  (A14) 

n—  1 

m  =  1 _ ,A/.  The  FT  E(w)  is  being  approximated  by  a  function  of  the  form 

.  .V 

F  i(o)  =  Piw)  ^  bin)  exp  (/(o/„);  (A15) 

n—  1 

This  is  not  a  model.  We  are  not  saying  that  Ficu)  looks  like  Eq.  (A15),  but  we  are  using  functions  of 
that  type  to  approximate  Fioj),  whatever  Fiw)  looks  like.  Note  that  by  using  the  t„  in  f(w)  we  are 
able  to  use  our  data  values  on  the  left-hand  side  of  Eq.  (A14);  if  we  had  chosen  another  form  for 
Ficu).  the  set  of  equations  we  would  have  obtained  from  minimizing  the  error  would  have  involved 
values  of  /  that  we  do  not  have.  That  is,  if  we  want  a  minimum  weighted  error  approximation,  the 
form  of  the  approximation  is  determined  partially  by  the  nature  of  the  data  (the  t„  values).  The  Piu) 
is  chosen  by  us,  and  this  allows  us  to  tailor  the  approximation  to  fit  whatever  prior  knowledge  we  may 
have  about  f  and  F.  Some  examples  will  help  to  illustrate  the  procedure. 

Example  1.  Suppose  we  are  in  case  A  and  our  prior  knowledge  is  that  f  is  a  bandlimited  function 
If  has  bounded  support)  and  that  Fiw)  =  0  if  lea |  ^  tt/A.  We  select  Piu))  to  be  the  function  that  is  1 
for  loul  <  tt/A  ,  0  otherwise.  Then  at  least  Fiuu)  in  Eq.  (A15)  will  have  the  proper  support.  Then  Eq. 
I A 14)  becomes 


./’( wA) 


z 


I 


exp  (/  coin  —  m)A)  d(o/2n. 


(A16) 


=  i\/\) bim) .m  =  1,...  ,M 


so  the  estimator  is 


N 

F  ixu)  =  A  /(wA)  exp  (/cam  A)  =  NA  Flicu).  (A17) 

m—  1 

for  leal  <  tt/A.  Here  Eq.  (A17)  makes  sense  for  every  ca,  not  just  for  a  finite  number,  as  before. 
Also,  it  is  eVA  //(ca)  itself  that  now  estimates  Fico).  We  can  also  see  more  clearly  the  role  of  zero¬ 
padding.  In  practice  we  cannot  actually  evaluate  Fiat)  at  all  values  |ca|  <  ir/A,  so  we  select  some  finite 
number.  Suppose  we  wani  N  +  K  values,  equispaced  within  ( —  rr/A  ,7r/A].  Then  the  calculations  are 
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exaciK  those  we  [lert'ormed  when  we  zero-padded  in  the  modelling  approach.  The  FFT  algorithms  are 
designed  ti)  compute  DFT  models  but  can  be  used  to  evaluate  F((o)  at  A'  +  equispaced  points  if  we 
pretend  that  we  had  ,V  -t-  A'  data  values  originally. 


F:xample  2:  Suppose  we  have  further  prior  information  that  Fiio)  =  0  for  Ito  I  >  (1 ,  for  some 
known  11  less  than  tt/A.  The  estimate  in  Eq.  (A17)  approximated  F(w)  on  [  —  7r/A,7r/A],  but  we 
know  now  that  outside  of  e  smaller  interval  [  —  11 ,11]  Flta)  =  0,  We  then  modify  Piw),  making  it  1 
if  loji  <  11 ,0  otherwise.  Now  Eq.  (A14)  becomes 

/  ( ni  A )  =  ^  A  ( /! ) 

I 


r  " 

J  exp  —  ot)A)  dw/iTT 


=  ^  bin) 
1 


sin  (11  im  —  n  )A) 
Trim  —  rtlA 


(A18) 


Solving  Eq.  (A18).  our  estimate  of  Fiw).  valid  for  |(o|  ft,  becomes 

V 

Fiai)  =  ^  bin)  exp  (/oj/nA),  (A19) 

m-l 


where  the  bhi)  are  i  generally)  not  equal  to  (l/A)/(wA). 

Example  3.  Consider  case  B  with  the  same  prior  knowledge  as  in  example  1.  Then  Eq.  (A  14) 
becomes 

/<!„)=  ^  bin)  J  exp  ii  (u  it„  -  !„))  d(jj/2n.  (A20) 

/I  -  1 

The  matrix  that  must  be  inserted  to  find  the  bin)  is  not  just  (1/A)/,  as  it  was  in  example  1  (unless,  of 
course.  i,„  =  wA  )  Consequently  our  estimate  of  Fioj).  valid  for  leal  <  tt/A.  will  be 

.  V 

Fioj)  =  ^  him)  exp  iicor„).  (A21) 

m—  I 

The  bim)  will  not  be  ( l/A),/'(r,„ )  generally,  but  some  other  value,  dependent  on  all  the  /it„).  In  par¬ 
ticular  Eq.  (A2I)  will  not  be  7((u)  unless  we  are  in  case  A. 

In  all  cases  we  have  considered  here  the  estimators  of  Ficu)  are  themselves  consistent  with  the 
data:  that  is,  if  we  apply  the  inversion  formula  Eq.  (A2)  and  check  the  values  at  the  r  —  or  !  —  rnS. 
we  get  the  data  values  back  again  This  can  be  troublesome  if  the  data  contain  some  additive  noise  and 
the  matrix  we  invert  is  as  in  example  2,  for  11  much  less  than  rr/A.  To  avoid  instability  in  such  cases. 
It  IS  advisable  to  increase  the  values  along  the  main  diagonal  by,  say,  lO'Ki,  before  inverting.  .A  more 
rigorous  treatment  of  stability  is  beyond  the  scope  of  this  appendix  (see  Refs.  A2  and  A3  and  any  other 
articles  on  "regularization)". 
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Appendix  B 

COMPLEX  GAUSSIAN  RANDOM  VARIABLES 

If  X  and  Y  are  real-valued  Gaussian  random  variables,  with  means  my  and  variances  arl,  o-  y, 
respectively,  then  the  complex-valued  random  variable  Z  ^  X  +  iY  may  not  have  any  of  the  useful 
mathematical  properties  that  we  associate  with  the  term  Gaussian.  For  that  reason  it  is  necessary  to 
make  further  assumptions  about  the  real  and  imaginary  parts  of  a  complex  random  variable  before 
calling  it  Gaussian.  Consider  here  only  the  single  variate  case;  the  more  general  case  is  treated  in  detail 
in  Goodman  [Bl]. 

With  no  further  assumptions,  it  is  the  case  that  the  mean  of  Z  is  ntz  =  ntx  +  inty.  The  variance, 
o-|,  is 

(Tz  =  E\Z  —  =  <Tx  -t-  O-y,  (Bl) 

regardless  of  possible  dependence  between  X  and  Y.  To  have  the  statistics  of  Z  completely  determined 
by  its  mean  and  variance  (as  in  the  real  case),  we  must  assume  that  X  and  Y  are  independent,  which 
implies  then  that  the  real  random  vector  (X,  Y)  is  multivariate  Gaussian  with  a  diagonal  correlation 
matrix. 

The  benefit  that  comes  from  using  Gaussian  statistics  is  due  to  the  pleasant  algebraic  properties  of 
the  density  function  exp  (-x^).  We  would  like  Z  to  have  a  probability  density  function  proportional  to 

p(z)  =  exp  (-|z  -  otJVo-j^)  ,  (B2) 

which  is 

p(z)  =  exp  (  -  [(x  -  mx)^  +  (y  -  my)^]/(a-x  +  erf)).  (B3) 

If  we  assume  that  o-|  =  <rf,  then  Eq.  (B3)  becomes 

piz)  =  exp  (-y(x  -  mx)V<Tx)  exp  (--^Cv  -  m^)Vcrf) ,  (B4) 

and  the  right-hand  side  of  Eq.  (B4)  is  (except  for  constants)  the  true  probability  density  function  for  Z. 
Therefore,  Z  will  have  a  density  of  the  form  Eq.  (B3)  provided  we  assume  X  and  Y  are  independent, 
and  o-f  =  trf.  In  the  multivariate  case,  similar  assumptions  are  made  about  the  covariance  matrix  of 
the  random  vector  of  real  and  imaginary  parts  [Bl]. 

We  say  then  that  Z  =  X  +  iY  is  a  complex  Gaussian  random  variable  if  X  and  Y  are  independent 
real  Gaussian  random  variables  with  the  same  variance.  It  then  follows  that  (using  Z  Z  —  mz-,  and 
similarly  for  X,  f ) 

(a)  IZl^  =  -F  F  is  o-|  X2-  where  xl  's  a  chi-squared  random  variable  with  two  degrees  of 
freedom; 

(b)  \Z\~JW^  is  (t/R,  where  R  is  a  Rayleigh  random  variable; 

(c)  If  Z  =  |Z|  exp  {i9),  then  tan  9  -  Y/X  is  a  Cauchy  random  variable; 

(d)  9  »  arc  lan(Y/X)  is  uniformly  distributed  over  (-ir,  tt). 

For  further  discussion  of  these  results  see  Refs.  B2  and  B3, 
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Appendix  C 

ARRAY  GAIN,  DETECTION,  AND  LIKELIHOOD  RATIOS 

The  basic  model  used  in  the  statistical  theory  of  detection  of  a  narrowband  source  is  that  the  data 
vector  X  obtained  from  the  array  either  has  entries 

x(m)  —  n(m),  m  =  I,  . . .  ,M,  (H(,) ,  (Cl) 

or 

x(m)  =  A  e  +  n(m),  (//]),  (C2) 

where  the  n(m)  are  independent  complex  Gaussian  random  variables  with  mean  0  and  variance  A 
is  complex  Gaussian  with  mean  0  and  variance  and  is  the  lime  delay  at  the  mth  sensor 
corresponding  to  a  single  frequency  planewave  with  arrival  angle  9  relative  to  broadside.  The  9  is  fixed, 
the  two  hypotheses  Hq  and  //[  are  the  only  ones  admitted,  and  the  problem  is  to  decide  which  one  it  is. 

If  we  simply  average  the  entries  x(m),  we  get 

ya=  ^  x(m);  (C3) 

m— \ 

if  Hq  is  true,  then  yo  is  complex  Gaussian  with  mean  0,  variance  pVA/,  and  if  f/|  is  true,  the  variance 
becomes 

^  (o-VA/^)  +  (pVA/).  (C4) 

m— 1 

This  is  fine  if  all  the  delays  are  t„,  =  0  (broadside  arrival),  because  then  the  variance  is  cr^  + 

However,  if  the  arrival  is  not  broadside,  the  variance  in  the  signal  component  can  be  much  less  than 
(T^.  The  significance  of  the  variance  lies  in  the  distribution  of  lyol^  •’  ^O'  's  times  a 

x\  r.v.  (chi-squared  with  two  degrees  of  freedom  having  mean  1  and  variance  2),  so  the  mean  of  lyoP 

is  (pVM).  If  //i,  then  the  mean  of  lyol^  >s  given  by  Eq.  (C4).  The  objective  is  to  increase  the  differ¬ 
ence  in  means  as  much  as  possible,  so  as  to  improve  detection.  For  the  off-broadside  cases  we  must 
process  the  data  differently  if  we  wish  to  detect  better. 

One  possibility  is  to  replace  yo  "'ith 

yi  =  ^  x{m)e^"'.  (C5) 

m— 1 

If  //o,  then  yi  is  still  complex  Gaussian  with  mean  0  and  variance  (p^AT).  If  then 

yi  —  .4  4- Af“'  (C6) 

m»l 

and  (assuming  signal  and  noise  are  independent)  yi  has  mean  0  and  variance  o-^  +  (p^/M).  The  r.v. 
lyi^  I  is  then  a  multiple  of  xi  >n  either  case,  with  mean  either  (p^/M)  (if  Hq)  or  cr^  +  (p^M)  (if  Hi). 
This  selective  reduction  of  noise  mean  only  is  what  is  meant  by  array  gain;  comparing  input  S/N  crVp^ 
;■>  output  SNR  (rV(pVA/)  we  have  the  improvement  in  S/N  of  M,  hence  an  array  gain  of  M.  What 
ma  es  achievement  of  this  gain  possible  is  the  ability  to  sum  coherently  the  signal  components  in  Eq. 
(C5),  that  is,  to  correct  for  the  phase  differences  in  the  Ae  terms  prior  to  summing. 
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Careful  analysis  of  what  was  just  done  shows  that  there  are  other  options  besides  yi  in  Eq.  (C5). 
If  6  —  (6(1),  ...  ,  b{M))^  is  any  complex  vector  such  that 


then  we  can  compute 


£  6(w)c""  -  1 


3^2 "  ^bTm)x(m) 

m-*l 


and  get  the  same  gain  as  before.  There  would  be  no  point  in  this  if  the  only  cases  that  needed  to  be 
considered  were  Hq  and  //[  and  we  could  assume  that  the  nim)  are  always  mutually  independent.  In 
most  cases  of  interest,  however,  there  can  be  signals  with  arrival  angles  other  than  9  present  in  x  and 
the  noises  may  be  correlated  between  pairs  of  sensors.  We  can  handle  these  more  complicated  cases  by 
using  the  freedom  to  select  various  6,  constrained  by  Eq.  (C7),  but  chosen  to  optimize  such  things  as 
sidelobe  and  noise  suppression. 

Note  that  our  definition  of  array  gain  is  based  on  the  assumption  that  the  n(m)  is  independent 
(uncorrelated).  Because  array  gain  depends  on  input  and  output  S/N,  it  is  a  function  of  the  actual 
noise  field  and  of  the  vector  b.  We  must  also  note  that  if  there  are  sources  present  with  arrival  angles 

other  than  0,  our  procedure  may  or  may  not  give  us  gain  against  those  components.  It  depends  on  how 

b  is  designed. 

Let  us  suppose  that  N  =  [E(n(m)  wO))l  is  the  noise-only  correlation  matrix.  If  we  use  3^2  Eq. 
(C8),  then  the  variance  of  y}  is  either  b'*'  Nb  (Hq)  or  b'^Sb  (//]),  where  +  denotes  conjugate 
transpose  and  we  have  assumed  Eq.  (C7).  The  input  S/N  is  and  the  output  S/N  is  <T^/b*Nb, 
hence  the  gain  obtainable  from  b  will  be 

gain(6)  =  pVb^Nb.  (C9) 

To  maximize  our  gain  in  this  case,  we  select  b  so  that  b'^Nb  is  minimized,  subject  to  Eq.  (C7),  which 
we  write  as  b^e  =  1,  €  =  (exp  (/tj),  . . .  ,  exp  The  solution  is  then 

b  ~  XN-'e,  X  =  l/e+A^-'e,  (CIO) 

and  the  optimal  array  gain  available  is 

optimal  gain  =  p^{e^N~^e) .  (Cl  1) 

If  N  ’^p^I,  then  Eq.  (Cll)  becomes  Af,  as  before.  The  maximum  gain  possible  in  this  situation 
{N  5^  p^l)  will  depend  on  the  spatial  distribution  of  noise  energy;  e^N~^e  will  be  small  if  P  is  in  a  sec¬ 
tor  where  the  noise  is  concentrated.  In  the  extreme  case  in  which  the  noise  consists  almost  entirely  of 
a  single  source  at  an  angle  near  9  the  gain  will  be  almost  zero. 

All  of  the  above  analysis  assumed  tacitly  that  the  b  remains  the  same,  whether  Hq  or  H\  is  true; 
the  definition  of  array  gain  depended  on  the  same  b  being  used  in  each  case.  For  nonlinear  bearing 
estimation  methods,  such  as  maximum  entropy  and  maximum  likelihood,  we  must  be  careful  when  we 
describe  them  in  terms  of  filters,  as  in  Eq.  (C8).  The  b  used  in  each  of  those  methods  is  a  function  of 
the  measured  cross  sensor  correlation  matrix  changes  as  we  pass  from  Hq  to  Hi-  Trying  to  speak  of 
array  gain  in  such  cases  leads  to  a  lot  of  confusion,  and  much  of  what  has  been  said  in  the  literature 
about  these  situations  should  be  ignored.  To  analyze  nonlinear  methods  properly,  we  must  either  drop 
the  idea  of  array  gain  or  substitute  some  other  notion  that  quantifies  the  statistical  advantage  we  hope 
to  achieve.  Array  gain  is  about  statistical  behavior  and  it  is  only  the  latter  that  continues  to  concern  us 
when  using  nonlinear  methods. 
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In  most  array  processing  situations  we  have  many  independent  realizations  of  vector  x.  The  max¬ 
imum  detection  probability  for  a  fixed  false  alarm  rate  is  obtained  by  placing  a  threshold  on  the  likeli¬ 
hood  ratio;  the  Gaussian  assumptions  tell  us  that 

p{x\Hq)  —  exp  and 

pixlH])  —  exp  (-x^ia-^ee'^  +  N)~^x), 

so  we  maximize  the  (log  of  the)  likelihood  ratio 

np(x|//|)/np(jf|//o),  (Cl  2) 

where  the  product  is  over  all  realizations  of  x  that  we  have  obtained.  Taking  logs  of  Eq.  (Cl 2)  and 
using  the  identity 

iaa^  fl)-'  -  fl-'  -  (B-'a)(B-'a)y(\  +  a^B-'a),  (C13) 

we  see  that  we  must  detect  based  on  the  size  of  the  quantity  e*N~'RN~^e,  where  R  is  the  average  of 
XX over  all  available  x. 

Using  e* Re,  the  so-called  conventional  processor,  is  optimal  within  the  narrow  confines  of  the 
two-hypothesis  situation  we  have  been  using,  with  N  =  If  we  wish  to  improve  sidelobe  structure, 
we  must  include  potential  sources  at  other  angles  within  the  noise  component  N . 
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